{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data visualization\n",
    "\n",
    "[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/geoai/blob/main/docs/examples/data_viz.ipynb)\n",
    "\n",
    "This notebook demonstrates how to work with geospatial imagery data using TorchGeo and GeoAI. We'll explore how to load data, sample it using different strategies, and visualize the results.\n",
    "\n",
    "## Install Package\n",
    "To use the `geoai-py` package, ensure it is installed in your environment. Uncomment the command below if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install geoai-py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing Required Libraries\n",
    "\n",
    "First, we import the necessary libraries for our geospatial data visualization workflow:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchgeo.datasets import NAIP\n",
    "from torchgeo.samplers import RandomGeoSampler, GridGeoSampler\n",
    "from geoai.utils import view_image, view_raster, dict_to_image\n",
    "from geoai.download import download_naip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download NAIP imagery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "root = \"naip_data\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bbox = (-117.6029, 47.65, -117.5936, 47.6563)\n",
    "downloaded_files = download_naip(\n",
    "    bbox=bbox,\n",
    "    output_dir=root,\n",
    "    max_items=1,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **torchgeo.datasets.NAIP**: Provides access to the National Agriculture Imagery Program (NAIP) dataset, which offers high-resolution aerial imagery across the United States.\n",
    "- **torchgeo.samplers**: Contains sampling strategies for geospatial data:\n",
    "  - **RandomGeoSampler**: Samples random patches from the dataset\n",
    "  - **GridGeoSampler**: Samples patches in a grid pattern with specified stride\n",
    "- **geoai.utils**: Custom utility functions for visualization:\n",
    "  - **view_image**: Visualizes tensor images\n",
    "  - **view_raster**: Displays georeferenced data on an interactive map\n",
    "  - **dict_to_image**: Converts dictionary representation to image format\n",
    "\n",
    "## Setting Up the Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load the NAIP dataset from the specified root directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = NAIP(root)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examine the dataset object to understand its properties:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will display information about the NAIP dataset including available imagery dates, coverage area, and other metadata.\n",
    "\n",
    "Check the Coordinate Reference System (CRS) used by the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.crs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The CRS defines how the geospatial data is projected onto a coordinate system, which is essential for accurate visualization and analysis.\n",
    "\n",
    "## Random Sampling of Geospatial Data\n",
    "\n",
    "Create a random sampler to extract patches from the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_sampler = RandomGeoSampler(dataset, size=256, length=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This creates a sampler that will randomly select 1000 patches, each 256x256 pixels in size. This sampling strategy is commonly used for training machine learning models where you need a diverse set of examples.\n",
    "\n",
    "Extract a bounding box from the random sampler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_bbox = next(iter(train_sampler))\n",
    "train_bbox"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The bounding box contains the coordinates defining the spatial extent of our randomly sampled patch.\n",
    "\n",
    "Load the actual image data corresponding to the randomly selected bounding box:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_image = dataset[next(iter(train_sampler))][\"image\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examine the complete data dictionary returned for a sample, which includes both the image and metadata:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset[next(iter(train_sampler))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This returns a dictionary containing the image tensor and associated metadata such as the bounding box, CRS, and other properties.\n",
    "\n",
    "## Visualizing Randomly Sampled Data\n",
    "\n",
    "Display the randomly sampled image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view_image(\n",
    "    train_image, transpose=True, scale_factor=(1 / 250), title=\"Random GeoSampler\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **transpose=True**: Rearranges the dimensions for proper display (from [C,H,W] to [H,W,C])\n",
    "- **scale_factor=(1/250)**: Scales the pixel values for better visualization\n",
    "- **title=\"Random GeoSampler\"**: Adds a descriptive title to the plot\n",
    "\n",
    "## Grid Sampling of Geospatial Data\n",
    "\n",
    "Create a grid sampler to extract patches in a systematic pattern:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_sampler = GridGeoSampler(dataset, size=256, stride=128)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This sampler extracts 256x256 pixel patches in a grid pattern with a stride of 128 pixels, meaning patches will overlap by 128 pixels. Grid sampling is typically used for inference or testing, where systematic coverage of the area is important.\n",
    "\n",
    "Extract a bounding box from the grid sampler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_bbox = next(iter(test_sampler))\n",
    "test_bbox"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load the image data for a patch selected by the grid sampler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_image = dataset[next(iter(test_sampler))][\"image\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing Grid Sampled Data\n",
    "\n",
    "Display the image from the grid sampler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view_image(test_image, transpose=True, scale_factor=(1 / 250), title=\"Grid GeoSampler\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The visualization shows a systematically sampled patch from the dataset.\n",
    "\n",
    "## Advanced Visualization with Geospatial Context\n",
    "\n",
    "Load a complete data sample including all metadata:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = dataset[next(iter(test_sampler))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Visualize the raster data on an interactive map with Esri.WorldImagery imagery as the background:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view_raster(data, basemap=\"Esri.WorldImagery\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This interactive visualization places the sampled data in its real-world geographic context, allowing you to see how it aligns with the Esri.WorldImagery imagery.\n",
    "\n",
    "## Key Takeaways\n",
    "\n",
    "1. **TorchGeo** provides a flexible framework for working with geospatial datasets like NAIP.\n",
    "2. Different sampling strategies (random vs. grid) serve different purposes in geospatial machine learning workflows.\n",
    "3. Visualization tools help understand the data in both pixel space (view_image) and geographic space (view_raster).\n",
    "4. Working with geospatial data requires attention to coordinate reference systems (CRS) and proper handling of georeferenced data."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
